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Abstract 

Total energy electronic structure calculations, based on density functional theory or on the more 
empirical tight binding approach, are generally believed to scale as the cube of the number of 
electrons. By using the localisaton property of the high temperature density matrix we present 
exact deterministic algorithms that scale linearly in one dimension and quadratically in all others. 
We also introduce a stochastic algorithm that scales linearly with system size. These results hold 
for metallic and non metallic systems and are substantiated by numerical calculations on model 
systems. 
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Total energy electronic structure calculations and molecular dynamics simulations based 
on density functional theory (DFT) or on the tight binding approach have been very suc- 
cessful in describing a large variety of phenomena and are finding increasing application 
in many fields of science. However even with present-day computer technology the size of 
the systems that can be studied is very restricted. This is due to the cubic dependence 
of the commonly used algorithms on the number of particles. This makes a daunting task 
of calculating the electronic properties of systems as large as those that are for instance of 
interest to nanotechnology and biochemistry. 

Some 15 years ago it was realized that this need not be so and that linearly scaling 
algorithms could be devised^. Up to now large number of linearly scaling algorithms have 
been proposed 3 , but they are not devoid of problems and for a variety of reasons they are 
not yet routinely used. Most algorithms rely on the fact that the wavefunctions can be 
localized and have an exponential decay leading to a sparse Hamiltonian. This property 
does not hold when the gap between occupied and unoccupied levels vanishes, as in the case 
of metals for which it has proven difficult to obtain linearly scaling algorithms. 

Here we propose a new approach to this problem that does not rely on an ability to localize 
the wavefunctions and is therefore equally applicable to metallic and non-metallic systems. 
We introduce a series of algorithms which defy the commonly accepted wisdom that DFT 
calculations are of O {N 3 ). In fact they are of O (N 2 ) and even of O (N). Furthermore we 
propose a stochastic algorithm that is linear scaling in all dimensions. A feature which sets 
our method apart from others is that it scales with the volume of the system and not with 
the number of electrons. A bonus, if one treats atomic species that are rich in electrons. 

We work at finite temperature 4 in the grand canonical ensemble, where the number of 
particles is controlled by the chemical potential fi and the relevant thermodynamic potential 
for spinless fermions is 4 : 

Q f = -^lndet(l + e /3( ^ H) ) (1) 
where H is a single particle Hamiltonian H = — |V + V (r). In the (3 — > oo limit 

n/ = E e <-/^ ( 2 ) 

i 

where the sum runs over the N lowest occupied states and N is the total number of particles. 
Limiting oneself to the consideration of free electrons in an external field is not as restrictive 
as it might at first seem. In fact it is well known that the full density functional can be 
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obtained from the sum over the occupied orbital energy in Eq. (J2J), if the external potential 
of the Hamiltonian H is the self-consistent DFT potential and double counting terms are 
subtracted^. Since the latter can be calculated with linear effort, finding a method for 
evaluating Qf in O(N) operations solves the algorithmically hard part of the problem. 
We make use of the identity: 

(l + e/»0*-H)) = n (l + e l ^ 2l ~V e K^)) ( 3 ) 
1=1 

which is valid for any even P and the product goes over all the complex P th roots of —1. 
Using this decomposition one finds 



n, = - -=2>det (M(0) 
p i=i 



(4) 



with M(l) = l + e*?^- 1 ) ep^~ H) . This is rather more complicated than Eq. (£[]) but has the 
advantage that it involves the propagator eT^-H) ra ther than the more difficult e^~ H \ 
In fact if P is large enough one can use for ep^~ H ^ one of the many high-temperature 
representations of the exponential operator, such as the one based on the expansion in 
terms of Chebychev polynomials^ or the Trotter decomposition^, as is commonly done in 
the numerical evaluation of path integrals. In this first implementation we shall employ the 
latter, which uses the identity: 



(5) 



- m K (r-r'\ 2 
g 2H 2 f3 



2r2 /3 decays very rapidly and most elements of 



valid to O f(p) V For large P e 
e §;{n-H) can neglected, leading to a sparse matrix which is a key factor for obtaining a 
linear scaling algorithm. Since this property does not depend on the possibility of localizing 
the wavefunctions this makes our method suitable also for metals. Alavi and Frenkel 9 have 
shown that the high temperature density matrix in a cubic lattice of spacing 5 can equally 
well be written as: 



e £(M-») 



(6) 
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e ^( At -y(i)) e -^ e ^(M-^(i)) 



i =j 

if i and j are 
first neighbors 
otherwise 



where i and j are lattice site indices, P is related to the lattice spacing by P = 
3A32h 2 (3/{m5 2 ) and C is a normalization constant. This expression is compatible with 
the approximations made so fare, leads to an elegant lattice model with nearest neighbor 
interactions and at the same time does not make the problem any less complex, nor does it 
alter its scaling behaviour. 

Using this representation of the density matrix it is easily seen that in one dimension 
the matrices M (I) that appear in Eq. (J1J are tridiagonal. Since the determinant of a 
tridiagonal matrice can be computed in O(M) operations, we arrive at the interesting result 
that linear scaling is exact in Id. Moving to higher dimensions the M (I) matrices become 
block tridiagonal where the dimension of each block is m = 

M (d-i)/d _ In gpite of the fact 

that the blocks are very sparse we were unable to calculate the determinant of M (I) in 
less than Mm 2 = M 3 ~~ 2 / d operations. This is only marginally better than the standard 
M 3 scaling. Furthermore in 3d the resulting algorithm has a very unfavorable prefactor, 
which makes this approach unpractical unless substantially improved for instance by better 
exploiting the sparsity of M (I). 

A more favorable scaling can be obtained if one focuses on the response of 0/ with respect 
to appropriate parameters, which is a standard way of calculating physical quantities. For 
instance the number of particles is given by (N) = — the energy by (E) = 8 ^ / ' > + // (N) 
and so on. In general one can write for the value of a property A, conjugated to the field 

which requires the inversion of the sparse matrices M(l) and not the calculation of its 
determinant. The inverse of M{1) can be found if one solves the M sets of linear equa- 
tions M{l)cf) 1 - = ipj where {iftj} is a complete orthonormal basis set and is given by 
M{l)~ l = J2jL\ tfifo]- Using a preconditioned biconjugate gradient method^ and the spar- 
sity of M(l) we find that solving each linear equation takes O (M) operations leading to an 
overall quadratic scaling also in 3d. An efficient preconditioner has proved to be the inverse 
of M f(l) for free particles. Although in this case the full inverse can be evaluated exactly 
using Fourier transforms methods it is computationally expedient to truncate M so 
as to give to Ai"j(Z) _1 the same sparse structure as M(l). As we shall see below the theo- 
retical O (M 2 ) scaling can be demonstrated in practice, unfortunately at the cost of a large 
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prefactor. 

However linear scaling can be obtained if we use a stochastic approach. Taking our cue 
from what is done in quantum chromodynamics we introduce a random vector ip. If the ipi 
are stochastically distributed such that their average satisfies: 

(ViVS) = $j and (Vi> = 0, (8) 

the inverse of M can be written as an expectation value: 

Mil)' 1 = (0V) • (9) 

where the average is taken over the stochastic process and 4> l is the solution of the linear 
equation M{l)<f> 1 = ifi. In principle any distribution that fulfills Eq. (jHJ) allows finding the 
inverse of M{1) as in Eq. @. However in practice it has been noted that the statistical 
error does depend on the choice of the distribution. The one that gave the smaller noise was 
the Si distribution, in which the ipi are distributed as S(ip*ipj — 1)^. This means that the 
variables ipi can take random values e ia on the unit circle with equal probability. Physical 
quantities can then be calculated according to Eq. ((7j) as: 




and therefore one finds overall linear scaling behavior. 

We now substantiate the claims made on the scaling of the different algorithms intro- 
duced here with numerical calculations. Different model potentials have been investigated 
with satisfactory results. Here we report only one calculation done on a periodic potential 
constructed with a superposition of Gaussians SjLi —we~ ( - r ~ Rl ' >2 ^ 2 Q [r 2 c — (r — Ri) 2 ) where 
w = 4 a.u. and the cutoff radius is r c = 6 a.u.. The Gaussian centers Ri are arranged to 
form a cubic lattice and mimic a crystal of N atoms. The spacing between the atomic sites 
is taken to be 45 with 5 = 0.75 a.u. . Periodic boundary conditions are imposed through- 
out. The Trotter number is chosen to be P = 256 leading to an electronic temperature 
T = 7529f^. This is rather small on the electronic energy scale and we have explicitly 
verified that it is close to the T — >• limit for the model. 

The number of electrons, the kinetic and the potential energy as a function of the chemical 
potential are shown in Figure ^ Upon increasing /x the states are filled with electrons. At 
ji w —1.5 a.u. half of the states belonging to the first band are occupied and the system 
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FIG. 1: Particle number, kinetic and potential energy of the Gaussian model as a function of 
the chemical potential. The circles show the results calculated with the stochastic linear scal- 
ing algorithm. The lines are obtained from the smallest eigenvalues calculated with an iterative 
diagonalisation algorithm. 

is metallic. For /i m —1.2 a.u. the first band is completely filled and the model behaves as 
a large band gap insulator. The number of electrons, the kinetic and the potential energy 
are calculated with the stochastic algorithm and an iterative routine which calculates the 
largest relevant eigenvalues for compariso n 12 ! 13 ' 14 ' 15 . The agreement is excellent. In order 
to reach the required precision, averages needed to be taken over about 100 independent 
configurations. 

In a stochastic evaluation it is important to keep track of the error which for each property 
A is given by (Ja/VNmc where a a is its mean square fluctuation and Nmc the number of 
Monte Carlo steps. A number of physical observables together with their estimated oa 
is given in Table I. It is seen that the variances for the different observables do not differ 
qualitatively for metals and insulators. Thus the number of Monte Carlo steps does not have 
to be larger for metallic systems than for insulators. We have to mention that if I is close to 
P/2 the condition number of M(l) can be big in the metallic case. This could in principle 
increase the number of iterations needed to solve M(l)4> l j = ipj. In practice however this 
did not lead to significant problems and the performance of the algorithm in the half-filled 
and filled case are very similar. For fixed Trotter number P the algorithm scales linearly 
with the number of grid points as the lattice constant S is reduced. 

We now compare the scaling of the different algorithms introduced here with a standard 
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insulator 



metal 



(A) a A (A) a A 



64.0 2.7 32.0 4.5 



T 



77.2 5.4 36.7 4.1 



V -174.7 5.0 -86.1 4.2 



TABLE I: Average values and variances for the particle number N, the kinetic energy T, the 
potential energy V and the density at an atom site p. 



FIG. 2: Log-log plot of the CPU time t(N) versus the number of atoms N. The measured 
slopes confirm that the scaling is 0(N), 0(N 2 ), 0(N 7 ^ 3 ) and 0(N 3 ) for the stochastic algorithm, 
the iterative inversion, the banded LU decomposition and the partial iterative diagonalization 
respectively. The unit of time is = r(N)/N measured at large N. That is the asymptotic 
incremental cost of an extra atom in the — > oo limit. The computation was performed on a 1.7 
GHz Pentium4 Xenon processor. 

diagonalization procedure in which the largest relevant eigenvalues of e^^~ H ^ are computed 
with sparse matrix diagonalisation technique o 12llSl14115 . In Figure 121 we see that the pre- 
dictions made on the scaling of the different algorithms as a function of system size are 
confirmed by actual calculations. 

Comparing the performance of the stochastic method with numerically exact methods is 
not easy since the performance of the method depends on the accuracy required. Here, in 
comparing the performance for different system sizes we have instead kept the number of 
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100 

N 



1000 
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Monte Carlo steps constant. Had we kept the relative accuracy constant this would have 
led to sublinear scaling due to the self-averaging properties of the larger systems. With this 
caveat from Figure|2]one can see that our new algorithm based on stochastic matrix inversion 
techniques scales linearly, while the algorithm based on matrix diagonalisation shows a cubic 
scaling. For the model chosen the crossing point at which our method becomes more efficient 
is M ~ 38400, which corresponds to 600 atoms. It must be stressed that we have taken the 
worst case scenario since we are here dealing with metals and P has to be taken larger to 
reach the T = limit. At full filling convergence is reached almost for P = 128. For this 
P value the crossing point occurs at N = 450 atoms. For different systems this value can 
vary because it might be necessary to choose a larger value for the Trotter number or the 
acceptable statistical error should be smaller. Still our method can be expected to become 
more efficient than cubic scaling algorithms at system sizes of a few hundred atoms. It should 
be also mentioned that the algorithm is trivially parallelizable. Furthermore increasing the 
number of electrons while keeping the volume constant does not increase the computational 
cost of our stochastic approach. In contrast the deterministic methods at constant volume 
scale quadratically with the number of electrons. Another advantage of the present method 
is memory saving, which grows linearly with volume and not as the product of the number 
of electrons times the volume. 

In order to apply this method to fully self consistent DFT calculation one must take into 
account the fact that the evaluation of the electron density is affected by a statistical error. 
This will be considered at a later stage. As it is now the method can be profitably applied 
to tight binding calculations and to the Harris functional approximation 16 to DFT. It can 
also be extended to include the ionic positions in the sampling so as to obtain an efficient 
Monte Carlo method. 
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